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The aim in Inigli-resolution connectomics is to reconstruct complete neuronal connectivity 
in a tissue. Currently, the only technology capable of resolving the smallest neuronal 
processes is electron microscopy (EM). Thus, a common approach to network 
reconstruction is to perform (error-prone) automatic segmentation of EM images, followed 
by manual proofreading by experts to fix errors. We have developed an algorithm and 
software library to not only improve the accuracy of the initial automatic segmentation, but 
also point out the image coordinates where it is likely to have made errors. Our software, 
called gala (graph-based active learning of agglomeration), improves the state of the art 
in agglomerative image segmentation. It is implemented in Python and makes extensive 
use of the scientific Python stack (numpy, scipy, networkx, scikit-learn, scikit-image, and 
others). We present here the software architecture of the gala library, and discuss several 
designs that we consider would be generally useful for other segmentation packages. We 
also discuss the current limitations of the gala library and how we intend to address them. 
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1. INTRODUCTION 

Connectomics, the elucidation of complete neuronal circuits, 
requires resolutions as low as 5-10 nm to distinguish the smallest 
neuronal processes, but also fields of view hundreds of microm- 
eters across or more, as neurons can easily span those distances. 
This size disparity results in large image volumes of at least 10 
gigavoxels and often orders of magnitude larger. Neurons are vis- 
ible as distinct regions, or segments, in this 3-dimensional image 
volume. Combining an accurate segmentation with the position 
of pre- and post-synaptic sites in the image (Kreshuk et al, 2011; 
lagadeesh et al., in press), one can obtain the shapes, locations, 
and connectivity of all the neurons in an image volume, as has 
been demonstrated in Helmstaedter et al. (2013) and Takemura 
etal. (2013). 

Various methods have been proposed (and implemented) 
to go from images to neuronal morphology and connectivity. 
Cardona et al. (2010) used manual tracing of the midline of 
neuronal processes, along with manual annotation of synapses, 
to analyze a neuronal circuit. Helmstaedter et al. (2011) refined 
this approach by having multiple individuals trace the same 
neuron, thus allowing them to estimate the error rate of their 
traces. Manual tracing alone, however, wiU not scale to the recon- 
struction of large neuronal circuits (Helmstaedter, 2013). An 
alternate strategy, then, has been to use automatic segmenta- 
tion of the neurons in the image volume, followed by human 
proofreading of this segmentation (Chklovskii et al, 2010). 
Until recently, this proofreading was the rate-limiting step for 



neuronal reconstruction from EM images, for two reasons: first, 
the segmentation algorithms were (and still are) orders of magni- 
tude too inaccurate to reconstruct even a single neuron without 
errors; and second, the human proofreaders had to examine 
every voxel of the image, even if the automatic segmentation 
is correct. In response, we developed a new machine learning- 
based algorithm for image segmentation (Nunez-Iglesias et al., 
2013) that provides state of the art automatic segmentation accu- 
racy and then directs proofreaders to likely areas of error in the 
segmentation. This has dramatically sped up proofreading and 
reconstruction speed (5-16-fold, in our anecdotal observations). 

The algorithm, called GALA (graph-based active learning of 
agglomeration), works by repeatedly consulting a gold standard 
segmentation (prepared by human annotators) as it agglomerates 
sub-segments according to its current best guess. (Note: through- 
out this paper, we will use "GALA" to describe the algorithm, and 
"gala" or "Gala" for the Python library and software.) It thus accu- 
mulates a training dataset used to fit a classifier, which guides 
subsequent agglomeration decisions. Furthermore, through the 
probability output of the classifier, it can estimate its own con- 
fidence in whether two segments should be merged, and this 
estimate can be used for proofreading. 

GALA outperformed previous agglomeration methods for 
automatic segmentation of an isotropic (10 x 10 x 10 nm resolu- 
tion) focused ion beam scanning electron microscope (FIBSEM) 
dataset of Drosophila larva neuropil (Nunez-Iglesias et al., 2013). 
What's more, it remained top-ranked for 8 months in the 
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FIGURE 1 I Two sample automatic segmentations performed with gala. 

(A) The SNEMI3D test data. The XZ plane (left) shows the initial, extremely 
oversegmented superpixel map, while the YZ (right) and XY (bottom) planes 
show the final segmentation returned by gala. Three complete neuronal 
segments are highlighted in 3D. Note that the segmentation is not 
perfect — stubs on the magenta 3D segment are candidates for missed 



branches, and a big false split is apparent on the YZ cut plane (arrows). 3D 
shape features should improve these results (Bogovic et al., 2013). (B) Our 
favorite fuzzball from the Berkeley Segmentation Data Set. Clockwise from 
top-left: original image, gPb boundary probability map (using the cubehelix 
colormap), watershed superpixels, and final GALA segmentation using 
threshold of 0.5. 



SNEMI3D challenge, which uses anisotropic neuronal EM data 
of 6 X 6 X 30 nm resolution, against tools developed specifically 
for anisotropic data and comprising about 70 submissions (Liu 
et al, 2012; Arganda-Carreras et al, 2013; Kaynig et al, 2013; Liu 
et al., 2014). Finally, thanks to the nD design of gala, we were 
able to segment 2D natural images and outperform state of the 
art agglomerative methods on the Berkeley Segmentation Data 
Set (BSDS) (Nunez-lglesias et al., 2013). See Figure 1 for two 
segmentation examples obtained with the gala library. 

Since we have described the details of the GALA algorithm in 
detail elsewhere (Nunez-lglesias et al., 2013), in this paper we 
focus on the design aspects of our implementation. We briefly 
present the Python application programming interface (API) and 
the command-line interface (CLI), before delving more deeply 
into particularly useful design choices, and finally discussing the 
current limitations of the library and future directions. 

2. API 

2.1. PYTHON API 

GALA belongs a class of segmentation algorithms called agglom- 
erative algorithms, in which segments are formed by merging 
smaller segments. Other examples include mean agglomeration 
(Arbelaez et al., 2010), the graphical models of Andres et al. 
(2012a,b), and Learning to Agglomerate Superpixel Hierarchies 
(LASH) (Jain et al, 2011), which is most similar to GALA. 
Agglomerative methods begin with an initial fine-grained seg- 
mentation known as an oversegmentation or superpixel map. The 
superpixel approach allows a massive reduction in computa- 
tional cost, enabling the use of more sophisticated algorithms in 
the agglomerative step. Additionally, it allows the use of differ- 
ent strategies to group pixels and regions, which may have very 
different properties (Ren and Malik, 2003). 

GALA uses machine learning to obtain a merge priority func- 
tion or policy, which dictates which pair of segments to merge 
next. It has three main prerequisites to learn this function: 



• a superpixel map (or supervoxel), an initial fine-grained seg- 
mentation; and 

• a gold standard segmentation, that represents the true segmen- 
tation of a training volume; and 

• a pixel-level intensity map, which is optional but required 
for most features. This is usually the probability of boundary 
between segments, but can be other things, such as the proba- 
bility of the pixel belonging to a glial cell, or to an image of a 
cat (Le et al., 2012). The map can even be the input image itself 
Indeed, gala allows multi-channel pixel-level maps that are the 
concatenation of some or all of the above maps. 

For the first requirement, we have used the watershed algorithm 
(Vincent and Soille, 1991), but other methods, such as SLIC 
(Achanta et al, 2012) would work. Gala itself contains an imple- 
mentation of watershed, although for some parameter sets we just 
wrap the implementation in scikit-image, which is more 
efficient and works both in 2D and 3D. Again, the only require- 
ment here is that the input volume is partitioned into some 
integer-labeled regions, and that these regions do not cross true 
segment boundaries, at least approximately. The algorithm that 
generates this initial oversegmentation is not important. 

The second requirement is a completely segmented image to be 
used as ground truth. For neuronal EM images, we used ground 
truth segmentation generated with the open-source Raveler soft- 
ware (Olbris et al., in preparation), while a large ground truth 
body exists for natural images in the Berkeley Segmentation Data 
Set (BSDS) (Martin et al, 2001). For other images, such as 3D 
fluorescence microscopy images, generation of ground truth can 
be a laborious process. This has indeed become the rate-limiting 
step in a gala segmentation, so we are moving to eliminate this 
requirement so that only a subset of ground truth is needed (see 
discussion). 

For the final requirement, we have used Ilastik (Sommer et al., 
2011) in our own work on EM images, gPb (Maire et al., 2008) 
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for natural images, or the probability maps provided by Ciresan 
et al. (2012) for the SNEMI3D challenge. Gala is agnostic about 
the origin of the input probability maps, both theoretically and in 
this implementation. 

Given the above, we create a region adjacency graph, or RAG 
(implemented in gala . agglo . Rag) corresponding to the 
training superpixel and probability maps, and perform repeated 
training agglomerations of the superpixels while comparing 
against the ground truth (Rag. learn_agglomerate). This 
produces a training set, to which we can fit a classifier, which wiU 
then prioritize merges in a test volume to segment. These opera- 
tions are illustrated in the following code snippets, which can be 
run from gala's tests /example -data directory. 

First, we import the relevant gala modules and read in the data 
using the imio (image lO) module: 

# imports 

from gala import imio, classify, features, agglo, 
evaluate as ev 

# read in training data 

ground_truth_train = imio . read_h5_stack ( ' train-gt . 
Izf .h5' ) 

prob_map_train = imio . read_h5_stack (' train-pl . Izf .h5 ' ) 
watershed_train = imio . read_h5_stack (' train-ws . Izf . h5 ' ) 

Next, we create a feature manager. These can be concatenated 
using the Composite manager. Managers are covered in more 
detail in section 3.2. 

# create a feature manager 

fm = features .moments -Manager ( ) 

fh = features -histogram. Manager ( } 

fc = features -base . Composite (children^ [ fm, fh] ) 

Using the feature manager, watershed oversegmentation, ground 
truth segmentation, and probability map, we can create a region 
adjacency graph g_train and obtain a (features, labels) training 
dataset. 

g_train = agglo - Rag (watershed_train, prob_map_train, 
f eature_manager=f c ) 

{X, y, w, merges) = g_train - learn_agglomerate 

(ground_truth_train, fc) [0] 
y = y[:, 0] # gala has 3 truth labeling schemes, 

pick the first one 

With the training dataset, we can train a classifier, using scikit- 
learn syntax. Indeed, any scrkit-learn classifier can be used here. 

rf = classify. DefaultRandomForest {). fit (X, y) 

By composing the feature map and the classifier, we obtain a 
policy: a function whose input is a graph and two nodes (rep- 
resenting segments) and whose output is a number in [0, 1]. 

learned_policy = agglo . classif ier_probability ( fc , rf) 

This policy is then used to segment a test (previously unseen) vol- 
ume. We agglomerate the superpixels until the classifier returns a 
merge probability of 0.5, which corresponds to even odds that the 
merge is a true or false merge. (This assumes a well-calibrated 
classifier, meaning that the output corresponds to the probability 
of a sample feature vector belonging to the "-1-1" class. Bostrom 
(2008) showed random forests to be reasonably well-calibrated.) 



# get the test data and make a RAG with the trained 
policy 

prob_map_test,watershed_test = (map ( imio . read_h5_stack, 

[ 'test-pl-lzf .h5' , 
'test-ws.lzf -h5' ] ) ) 
g_test = agglo - Rag (watershed_test , prob_map_test , 

learned_policy , f eature_manager=f c) 
g_test . agglomerate ( 0 - 5 ) # best expected segmentation 
seg_testl = g_test - get_segmentation ( ) 

Because gala was created as research software, it implements a 
number of additional agglomerative segmentation algorithms, 
including mean boundary, oriented mean boundary (Arbelaez 
et al., 2010), median boundary, superpixel affinity learning (Ren 
and Malik, 2003) (which we also call "flat" learning), and 
LASH (Jain et al, 2011). These are invoked simply by using a 
different merge_prioritY_f unction keyword argument, 
or calling agglo . Rag . learn_agglomerate with different 
parameters. 

The gala API presents a simple tool to obtain state of the 
art segmentations, and also allows the exploration of a com- 
plete set of hierarchical agglomerative segmentation strategies. 
Further, because the segmentation strategy is learned, it can be 
applied with very little modification to many different domains, 
as demonstrated by its success in natural image segmentation as 
well as two different kinds of EM data. 

2.2. SEGMENTATION EVALUATION MODULE 

One of the most generally reusable parts of the gala library 
is the evaluation module in gala/ evaluate . py. It offers 
efficient implementations of edit distance. Rand Index (Rand, 
1971), Adjusted Rand Index (Hubert and Arable, 1985), Fowlkes- 
Mallows index (Fowlkes and Mallows, 1983), and Variation of 
Information (VI) (Meila, 2005). 

Among these, we focused the most effort on the VI met- 
ric for its numerous advantages (Meila, 2005; Nunez-lglesias 
et al., 2013). VI is an information theoretic measure to compare 
clusterings or segmentations. It precisely answers the following 
question: given the identity of a point in segmentation B, how 
much information, on average, will I need to know its iden- 
tity in segmentation A, and vice-versa? If the two segmentations 
match perfectly, the answer is 0 bits: no additional information 
is required if the identity in A is the same as the identity in B in 
every case. Formally, it is the sum of the conditional entropy of A 
given B and the conditional entropy of B given A: 

VI (A, B) = H (A\B) + H {B\A) (1) 

When A is our automatic segmentation and B is the gold stan- 
dard, H(A\B) measures the oversegmentation or false splits, and 
H(B\A) measures the undersegmentation or false merges. 

The two components of VI are computed efficiently (0(«pixeis) 
time complexity) with the evaluate . split_vi function. 

gt_test = imio . read_h5_stack (' test-gt . Izf .h5 ' ) 
import numpy as np 

results = np-vstack(( 

ev- split_vi (watershed_test, ground_truth_test) , 
ev- split_vi ( seg_testl , ground_truth_test ) 
) ) 

print (results ) 
[[ 0-1845286 1-64774412] 
[ 0-33793257 0-28697057]] 
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This is interpreted as follows: the undersegmentation VI of 
the watershed superpixels compared to the ground truth 
segmentation is 0. 1 8 bits. That is, the average watershed basin wiU 
have slightly over 97% overlap with one ground truth body and 
3% with another (since -0.97 logjCO. 97) - 0.03 log2(0.03) = 
0.19 ~ 0.185). In contrast, its oversegmentation VI is 1.64 bits, 
which means that most ground truth segments are split into 
more than 3 watershed pixels. (The conditional entropy of a per- 
fect (1/3, 1/3, 1/3) split is 1.58 bits. A perfect 50-50 split has an 
entropy of 1 bit.) 

By computing the two conditional entropies at each segmen- 
tation threshold, we generate the split VI plot (Figure 2) that we 
introduced in Nunez-lglesias et al. (2013), showing the trade- 
off between oversegmentation and undersegmentation. In this 
plot, the X-axis is the undersegmentation conditional entropy, 
measuring false merges, and the y-axis is the corresponding 
oversegmentation measurement. Agglomerative segmentations 
begin somewhere close to the y-axis (lots of oversegmenta- 
tion but very little undersegmentation). Then, a correct merge 
results in a downward move along the plot, while an incorrect 
merge causes a rightward move. The goal of a good segmen- 
tation algorithm, then, is to get as close as possible to (0, 0), 
a perfect match between automatic segmentation and ground 
truth. 

This approach is in contrast to the commonly used plot of VI 
against segmentation threshold (see e.g., Andres et al., 2012b), 
which obscures the tradeoff information. 

2.3. COMMAND-LINE INTERFACE 

In addition to the flexible Python library interface, we developed a 
set of command-line scripts to perform common gala functions. 



such as training and segmenting. This is the primary way of 
interacting with gala in a production environment. The scripts 
make use of the excellent argparse module, so usage can be 
determined by running the scripts with the -h or -help flags. Each 
option can be provided on the command line or through a JSON 
configuration file format. 

3. DESIGN HIGHLIGHTS 

In this section, we focus on a few design elements that we consider 
essential to gala's success. 

3.1. N-DIMENSIONAL ARRAY SUPPORT 

Many segmentation libraries assume 2D or 3D data, or provide 
separate functions for each (see, for example, Achanta et al., 2010, 
or many OpenCV functions). 

We instead abstracted away the notion of a neighboring 
pixel (or voxel) with a get_neighbor_idxs function that 
depends only on the pixel coordinates, the shape of the array, 
and a connectivity parameter. As all operations in our algo- 
rithm depend only on the definition of the local neighborhood, 
this abstraction made gala dimension-agnostic. This allowed us 
to produce segmentations of the Berkeley segmentation dataset 
(Martin et al, 2001) and a 3D EM dataset from the same 
code base. 

A great many algorithms in computer vision can be parame- 
terized by neighboring voxels. Thus, we encourage developers to 
write these using n-dimensional logic from the start to increase 
the range of applications of their software. The numpy library's 
excellent ndarray object was essential for our n-dimensional 
support, making gala a prime example of the success of the 
Python ecosystem for scientific computing. 





0.1 0.2 0.3 0.4 
false merges (bits) 



FIGURE 2 I Segmentation results for the focused ion beam scanning 
electron microscopy (FIBSEM) dataset of Drosophila melanogaster larva 
brain in Nunez-lglesias et al. (2013). (A) Example automatic segmentation 
of an octant of the dataset. The complex shapes of the three highlighted 
segments illustrate the difficulty of segmenting neuronal data. (B) Split VI 
plot from Nunez-lglesias et al. (2013), showing superior segmentation 



accuracy by gala over competing agglomerative algorithms. Lower and to the 
left is better. The stars indicate the point of lowest VI, and the circles indicate 
the point at threshold 0.5. Shaded areas show standard error of the mean for 
n = 56 observations ("flat," "agglo") or n = 8 observations ("mean"). The 
point labeled "best" represents the VI of a perfect merging of the initial 
(imperfect) superpixels. 
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3.2. FEATURE MANAGERS AND FEATURE CACHES 

An example of a critical feature when determining the probabil- 
ity that two segments should be merged is the average pixel-level 
probability of boundary (Ren and Malik, 2003). As segments are 
merged, the shared surface between them and their common 
neighbors increases, thereby making the average a more reliable 
estimate of the true probability of a boundary. Recomputing the 
mean from scratch, however, results in quadratic time complex- 
ity, because we are repeatedly iterating over the same pixels after 
each merge. Therefore, a more efficient strategy is to cache a sum 
of the pixel probabilities and the count of pixels examined. Then, 
when two boundaries are combined, their probability sums are 
added together, as are their counts, and the new average probabil- 
ity can be computed by dividing the new sum by the new count, 
in constant time. This caching turns a quadratic operation into a 
linear one. 

The above concept can be generalized to a large class of fea- 
tures. We found that, in many cases, caching intermediate com- 
putations dramatically improved the time complexity of feature 
computation. For example, to compute the standard deviation of 
the pixel probabilities, we must cache the sum of the squared pixel 
probabilities, along with simple sum and counts. To compute a 
histogram, we cache the unnormalized histogram, and each bin is 
summed when two boundaries are combined. 

We therefore devised a single class, which we call a feature 
manager, that is responsible for defining the cached values, and 
for computing the feature vector from the cached values. This 
has enabled less obvious features, including, for example, some 
based on the convex hull of the segment. The convex hull feature 
manager stores as a cache the convex hull of each node. When 
two nodes are merged, the resulting convex hull can be computed 
faster by starting from the two initial hulls, rather than from the 
newly formed segment, since these have fewer vertices than the 
segments themselves. The manager then uses the hull to compute 
features such as segment convexity, by comparing the volume of 
the convex hull to the volume of the segment. 

We have strived to make it easy to develop new feature man- 
agers, which will be useful as new, more sophisticated 3D segmen- 
tation features are developed (Bogovic et al., 2013). A GitHub puU 
request creating a new manager can be found at: https://github. 
com/jni/ray/puU/5 1 

3.3. CLASSIFIER ABSTRACTION 

Given the vast heterogeneity of our initial feature space, we 
wished to use a random forest (RF) as our classifier of choice, 
scikit-learn, the present gold-standard in machine learn- 
ing Python libraries, did not contain a RF implementation when 
we started building gala. We therefore decided to use Vigra, a 
C-|~l- image analysis and machine learning library with Python 
bindings. However, we recognized that a cross-compatible inter- 
face across libraries would allow rapid testing of various machine 
learning techniques. We therefore built a wrapper around 
Vigra to match scikit-learn's estimator interface, par- 
ticularly the fit ( ) , predict ( ) , and predict_proba ( ) 
methods. 

Because of this, it is trivial to try different classifiers for 
the learning and agglomeration steps of gala. In particu- 
lar, we have been able to use the recently vastly improved 



RandomForestClassif ier from scikit-learn version 
0.14 with no code changes. 

In short, by using established interfaces, we were able to future- 
proof our software. We recommend that anyone looking to build 
software in the Python ecosystem take a long look at related 
libraries to match interfaces as closely as possible. 

4. DISCUSSION 

Although we have discussed many of gala's strengths and its pos- 
itive design aspects, it does currently have limitations, which we 
describe here, along with potential fixes. 

4.1. COMPLETE GOLD STANDARD REQUIREMENT 

As currently implemented, gala requires a fully segmented vol- 
ume from which to learn. In our experiments, this has become 
the major bottleneck when starting segmentations on new data. 
Therefore, a priority in gala development going forward is the 
ability to mask volumes so that partial ground truth can be used. 

4.2. MEMORY AND TIME INEFFICIENCY 

Gala's implementation, based on NetworkX, is slow and has a high 
memory footprint. However, many improvements are within easy 
reach. 

Firstly, we currently store feature caches and compute feature 
vectors as separate arrays. This results in a huge time overhead for 
large graphs due to memory allocation, and also in memory usage 
because of the dictionaries required to store all the separate arrays. 
However, because we are performing a hierarchical agglomera- 
tion, we know that the number of nodes and edges is bounded by 
twice the initial number. Therefore, we can pre-allocate an initial 
array of shape (2 * n_nodes, cache_size) for the node 
feature caches, and similarly for the edges, and use an incremen- 
tal indexing scheme to keep track of which node or edge in the 
hierarchy uses which row of the array. 

Additionally, the graph currently stores indices to the vox- 
els comprising each node and boundary, which is unnecessary. 
Space and time can be saved by keeping only a single voxel and 
rebuilding nodes using a flood fill. 

Finally, we chose the heavy Graph class of the NetworkX 
library for its flexibility and fast node addition and removal. 
However, this is ultimately unnecessary: we can store the origi- 
nal supervoxel graph using a much more efficient structure, such 
as scipy. sparse . csc_graph, and maintain a merge tree. 
The graph at any level of the hierarchy can be rapidly constructed 
from this. 

5. CONCLUSIONS 

Like most academic software, gala is a mixture of new algorithms, 
some good design, and a variety of questionable decisions left 
over from a time of different priorities. We wrote this description 
in the hope that the existing and future functionality, the better 
parts of the software, and the lessons learned will be of value to 
the wider research community. We particularly emphasize that 
Don Knuth's famous maxim that "premature optimization is the 
root of all evil" (Knuth, 1974) should not be taken to extremes: 
in our case, this has led to time and memory performance issues 
that have been difficult to resolve. Still, gala's success in segment- 
ing not only the isotropic EM volume for which it was designed 
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(Glasner et al, 2011; Nunez-lglesias et al, 2013), but also the 
BSDS natural image dataset and the SNEMI3D anisotropic EM 
dataset, suggests that it will be useful for some time to come. In 
future work, we will explore the potential of gala to segment other 
kinds of neuronal data, including 3D light microscopy data and 
3D-|-t neuronal activity data. 
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